Add crashes_after_fix rule to flag fixed crash bugs still crashing on Nightly#2881
Open
spohlMozilla wants to merge 2 commits into
Open
Add crashes_after_fix rule to flag fixed crash bugs still crashing on Nightly#2881spohlMozilla wants to merge 2 commits into
spohlMozilla wants to merge 2 commits into
Conversation
… Nightly The macOS and Windows Spotlight teams have repeatedly hit the same workflow gap: a crash gets a speculative fix, the patch lands, the bug is marked RESOLVED FIXED -- and the signature keeps firing on Nightly after the build containing the fix has shipped. With nothing prompting us to re-check crash-stats a few days post-landing, this verification step gets skipped, and we have ended up discovering only much later (in some cases weeks or months) that the speculative fix didn't actually move the crash numbers. This rule plugs that gap. Once a day it picks RESOLVED FIXED bugs where cf_status_firefox_nightly is "fixed" and cf_last_resolved falls between min_days_since_fix (default 4) and max_days_since_fix (default 10) ago, runs a faceted Socorro SuperSearch over Nightly for the bug's signature(s) starting the day after the fix landed, and -- if min_crash_count (default 5) or more crashes have been recorded in that window -- needinfos the assignee asking whether the fix was incomplete, whether the signature is shared with a different underlying crash, or whether a follow-up is needed. The four-day floor gives the Nightly build containing the fix time to roll out and accumulate user exposure before the bot will fire. The rule skips bugs that already have any open needinfo, and also skips bugs whose comment history contains the rule's marker phrase, so it only pings the assignee once per fix.
Author
|
@suhaibmujahid would you mind taking a look when you have a moment? I couldn't add you as a formal reviewer (external-contributor permissions). Thanks! |
Contributor
|
Given we already have |
Per marco-c's review feedback: min_crash_count plus the "date >= fix_date + 1 day" Socorro filter already gate pings, so the 4-day floor was redundant. Removing it means the rule fires as soon as the threshold is crossed -- fast-burning regressions get caught earlier, slow-burning ones are still gated by min_crash_count. max_days_since_fix is kept as the upper bound on how long we keep polling a bug whose crash count is still below the threshold.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The macOS and Windows Spotlight teams have repeatedly hit the same
workflow gap: a crash gets a speculative fix, the patch lands, the bug
is marked RESOLVED FIXED -- and the signature keeps firing on Nightly
after the build containing the fix has shipped. With nothing prompting
us to re-check crash-stats a few days post-landing, this verification
step gets skipped, and we have ended up discovering only much later
(in some cases weeks or months) that the speculative fix didn't
actually move the crash numbers.
This rule plugs that gap. Once a day it picks RESOLVED FIXED bugs where
cf_status_firefox_nightly is "fixed" and cf_last_resolved falls between
min_days_since_fix (default 4) and max_days_since_fix (default 10) ago,
runs a faceted Socorro SuperSearch over Nightly for the bug's
signature(s) starting the day after the fix landed, and -- if
min_crash_count (default 5) or more crashes have been recorded in that
window -- needinfos the assignee asking whether the fix was incomplete,
whether the signature is shared with a different underlying crash, or
whether a follow-up is needed.
The four-day floor gives the Nightly build containing the fix time to
roll out and accumulate user exposure before the bot will fire. The
rule skips bugs that already have any open needinfo, and also skips
bugs whose comment history contains the rule's marker phrase, so it
only pings the assignee once per fix.